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ABSTRACT 

Genome duplication in free-living cellular organisms 
is performed by DNA replicases that always include 
a DNA polymerase, a DNA sliding clamp and a clamp 
loader. What are the evolutionary solutions for DNA 
replicases associated with smaller genomes? Are 
there some general principles? To address these 
questions we analyzed DNA replicases of double- 
stranded (ds) DNA viruses. In the process we dis- 
covered highly divergent B-family DNA polymerases 
in phiKZ-like phages and remote sliding clamp 
homologs in Ascoviridae family and Ma-LMM01 
phage. The analysis revealed a clear dependency 
between DNA replicase components and the viral 
genome size. As the genome size increases, viruses 
universally encode their own DNA polymerases and 
frequently have homologs of DNA sliding clamps, 
which sometimes are accompanied by clamp loader 
subunits. This pattern is highly non-random. The 
absence of sliding clamps in large viral genomes 
usually coincides with the presence of atypical poly- 
merases. Meanwhile, sliding clamp homologs, not 
accompanied by clamp loaders, have an elevated 
positive electrostatic potential, characteristic of 
non-ring viral processivity factors that bind the 
DNA directly. Unexpectedly, we found that similar 
electrostatic properties are shared by the eukaryotic 
9-1-1 clamp subunits, Hus1 and, to a lesser extent, 
Rad9, also suggesting the possibility of direct DNA 
binding. 

INTRODUCTION 

DNA replication is one of the most fundamental processes 
in all living entities. The replication of genomic DNA has 
to be not only accurate but also very efficient. To achieve 
this, free-living organisms from all three domains of life 



and some viruses use multicomponent protein machines 
termed DNA replicases. A DNA replicase consists of a 
DNA polymerase and accessory subunits including a 
DNA sliding clamp and a clamp loader (1). A sliding 
clamp is a ring-shaped polymerase processivity factor, 
which needs to be loaded onto the DNA by a clamp 
loading complex. Once loaded, a DNA sliding clamp en- 
circles the DNA double helix serving as a mobile tether for 
the replicative DNA polymerase. The attachment to the 
DNA-loaded sliding clamp transforms the polymerase 
into an extremely processive enzyme that can synthesize 
thousands of nucleotides without falling off the DNA (2). 

It is striking that despite the mechanistic uniformity of 
replication of the DNA double helix, the replicative DNA 
polymerases, central players in this process, are not uni- 
versally conserved. Both sequence (3) and structure (4,5) 
analyses led to the conclusion that bacterial replicative 
polymerases on one hand and eukaryotic/archaeal poly- 
merases on the other hand evolved independently from 
different ancestral proteins. The catalytic a-subunit of 
the bacterial replicative polymerase (polllla) belongs to 
the C-family of DNA polymerases (PolC). Eukaryotic 
and archaeal replicative polymerases belong to the unre- 
lated B-family (PolB). In addition, a unique D-family 
polymerase was found to participate together with a 
B-family polymerase in DNA replication in euryarchaea 
(6-8). In dsDNA viruses the diversity of replicative poly- 
merases is even larger. In addition to canonical B-family 
polymerases that initiate DNA synthesis from the 3' 
terminus of the RNA primer (PolBr), some viruses 
encode protein-primed DNA B-family polymerases 
(PolBp) that use a hydroxyl group supplied by a protein 
(9). A-family DNA polymerases (Pol A) that play only a 
limited/specialized role in DNA synthesis of cellular or- 
ganisms also participate in viral genome replication (10). 
Although distinct, the A-family is distantly related to the 
B-family (11). Interestingly, while B- and A-family repli- 
cative DNA polymerases in dsDNA viruses are common, 
C-family polymerases have been detected only in a 
handful of bacteriophages (12). 
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In contrast to the disparity of replicative DNA polymer- 
ases, their processivity factors (DNA sliding clamps) are 
conserved in all cellular organisms and T4-like phages 
(13). The bacterial DNA sliding clamp (polllip) is a 
homodimer, while eukaryotic and archaeal Proliferating 
Cell Nuclear Antigen (PCNA) is a homotrimer with few 
archaea having a heterotrimeric PCNA (14). The gp45 
sliding clamp in T4-like phages, like eukaryotic and the 
majority of archaeal PCNAs, is a homotrimer. Eukaryotes 
also have an additional PCNA-like heterotrimeric DNA 
sliding clamp, the 9-1-1 complex, which specializes in 
DNA repair processes (15). Despite differences in oligo- 
meric state (dimer or trimer) all these DNA sliding clamps 
represent structurally similar rings with pseudo 6-fold 
symmetry and a central hole large enough to fit the 
DNA double helix. Replicative DNA polymerases and 
other proteins usually interact with DNA sliding clamps 
through the hydrophobic pocket formed by the inter- 
domain connector (16). In addition to the ring-shaped 
gp45 DNA sliding clamp in T4-like phages, the viral world 
has produced alternative recipes of how to increase 
the DNA replication processivity. For instance, processi- 
vity factors in herpesviruses are structurally similar and 
have the identical domain composition as PCNA or 
gp45, but they do not form rings. UL42 acts as a 
monomer representing one-third of a ring (17), while 
UL44 and BMRF1 form C-shaped dimers that corres- 
pond to two thirds of a ring (18,19). Another virus-specific 
example is the recruitment of a host protein, unrelated to 
DNA sliding clamps (Escherichia coli thioredoxin), to 
serve as the DNA polymerase processivity factor in the 
T7 phage (20). 

Ring-type DNA sliding clamps need protein complexes 
known as clamp loaders for their loading onto DNA (1). 
All subunits of cellular clamp loaders belong to the AAA+ 
protein superfamily. Although the exact subunit compos- 
ition may differ, the core of all known clamp loaders is a 
pentameric protein complex with at least one subunit 
being different from the remaining four. Archaeal and eu- 
karyotic clamp loaders are quite similar. They are 
composed of one large and four small subunits. In eukary- 
otes all four small subunits are different, while in archaea 
they usually are identical or, in few cases, are represented 
by two types (21). The bacterial clamp loader consists of 8, 
8' and three copies of the y/x subunit. A clamp loader in 
T4-like phages is composed of four copies of gp44 and a 
single copy of gp62 protein. 

In free-living cellular organisms the combination of a 
DNA polymerase and a DNA sliding clamp with its 
loader appears to be a universal solution to the replicase 
processivity problem (1). In contrast, many dsDNA 
viruses do not encode processivity factors, and some do 
not even have their own DNA polymerases, totally relying 
on DNA replication machinery of the host. Could it be 
that the size of a genome is an important factor 
determining the need for a processive DNA replicase? 
Perhaps there is an approximate genome size threshold, 
above which the processivity properties of a replicase 
become critical? dsDNA viruses are an excellent model 
group for addressing such fundamental questions as they 
represent a wide range of genome sizes (from ~5 up to 



~1200kb) and a large variety of genome replication 
strategies. 

In this study, using data derived from the sequenced 
genomes of dsDNA viruses, we examined the presence 
and the type of viral DNA replicases in the context of 
their genome size. To this end we used sensitive 
homology detection methods to identify DNA polymer- 
ases, processivity factors and clamp loaders encoded in 
viral genomes. We detected a number of previously 
uncharacterized components of DNA replicases and 
explored their properties using a variety of computational 
methods. Our results establish that the presence and the 
type of DNA replicase components are linked with the 
viral genome size. 

MATERIALS AND METHODS 

Viral databases 

Viral protein and genome data were downloaded from 
NCBI URLs 'http://www.ncbi.nlm.nih.gov/protein/? 
term = dsDNA+viruses,+no+RNA+stage' and 'http:// 
www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi? 
taxid = 35237' respectively. Family Polydnaviridae was 
excluded from the analysis because these viruses have a 
distinct genome organization (split in small segments), and 
their genome acts only as a vector for transmission of 
parasitic wasp genes (22). 

Genome filtering 

To obtain a more representative genome set, highly similar 
genomes were removed. All genomes were compared 
to each other using LAST (vl28) (23), and genomes with 
local sequence identity >70% were filtered out. Repetitive 
genomic regions were identified and ignored during the 
comparison. 

Genome translations 

All the genomes of dsDNA viruses were subjected to the 
six-frame translation using Virtual Ribosome (24) and the 
standard genetic code translation table. In addition to 
annotated open reading frames (ORF), all previously un- 
assigned ORFs longer than 60 residues were retained for 
further analysis. 

Sequence similarity searches and the identification of 
conserved domains 

Standard sequence searches were performed using BLAST 
and PSI-BLAST (25) with default parameters in non- 
redundant (nr) databases installed locally and updated 
weekly. To identify conserved domains in viral protein 
and ORF sequences, each of them was searched against 
the CDD profile database (26) using RPS-BLAST (25) 
with default parameters. In addition, profile Hidden 
Markov Models (HMMs) were constructed for each 
viral sequence for searches against the library of profile 
HMMs of known protein structures (PDB). Profile 
HMMs were constructed using the buildali.pl script and 
HHmake algorithm from the HHsearch (vl.5.0) software 
suite (27). The profile HMM construction with buildali.pl 
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included running three iterations of PSI-BLAST search 
against the nr90 (nr filtered to maximum 90% sequence 
identity) database using the i?-value = le-03 inclusion 
threshold. HHsearch with default parameters was then 
used to search the pdb70 database of profile HMMs 
(ftp://toolkit.lmb.uni-muenchen.de/HHsearch/databases/) 
installed locally. In addition, locally generated profiles 
(profile HMMs) for individual DNA replicase compo- 
nents from multiple sequence alignments were appended 
to CDD and pdb70 databases. RPS-BLAST and 
HHsearch hits, with Is < 0.1 and probability >50%, re- 
spectively, were extracted from the results and analyzed 
for the presence of DNA polymerases, DNA sliding 
clamps and clamp loaders. Unreliable hits to replicase 
components were further validated with additional 
approaches such as COMA server (28) or GeneSilico 
MetaServer (29). 

Sequence clustering 

DNA replicase components were clustered according to 
their pairwise similarity using CLANS (30). The similarity 
in CLANS is represented with P-values derived from 
BLAST or PSI-BLAST E-values. For clustering divergent 
proteins (all DNA polymerases and all DNA sliding 
clamps), their pairwise similarity was quantified using 
PSI-BLAST. For each sequence, CLANS was configured 
to run two iterations of PSI-BLAST using the E = le-03 
inclusion threshold against the reference database (nr80) 
to generate a sequence profile. The last PSI-BLAST iter- 
ation with the obtained profile was performed against the 
database of sequences to be clustered. In our case this was 
the database of either viral DNA polymerases or sliding 
clamps. To partition the largest subset of B-family poly- 
merases (the PolBrCore cluster) into distinct groups, 
CLANS was based on a direct BLAST all-against-all 
sequence comparison. 

Multiple sequence alignments 

Multiple sequence alignments were constructed with 
MAFFT (31) optimized for accuracy (parameter 
L-INS-i). If sequences had homologs with known struc- 
tures PROMALS3D (32) with default parameters was 
used instead. 



Homology modeling 

Alignments between the sequence to be modeled (target) 
and a related structure (template) were constructed with 
PSI-BLAST-ISS (33), COMA server (28) or GeneSilico 
MetaServer (29). Uncertain alignment regions were modi- 
fied manually, during an iterative modeling process (34). 
Protein 3D models were constructed from target-template 
alignments using Modeller 9v7 (35). Models were eva- 
luated visually for significant flaws. In addition, the 
model quality was estimated using ProsaWeb (36) by 
comparing Prosa Z-scores of models with those of corres- 
ponding templates. 



Analysis of electrostatic properties 

Calculation of theoretical isoelectric points (pis) for DNA 
sliding clamps and their homologs was performed using 
the 'Isoelectric point' program from the EMBOSS 
software package (37). Sequences of sliding clamps and 
their homologs were collected by performing PSI- 
BLAST searches against the nr70 database. Non- 
conserved N- and C-termini were removed from the 
sequences before the pi calculation. Surface electrostatic 
potential maps were computed with APBS (vl.2.1), which 
was accessed through the PyMol APBS Tools2 plug-in 
(http://www.pymolwiki.org/index.php/APBS). Prior to 
computation, all heteroatoms and water molecules from 
PDB files were removed. Both models and PDB structures 
were prepared for calculations using PDB2PQR (vl.5) 
(38) with the AMBER force field. 



RESULTS 

DNA replicase components and the genome size 

We analyzed the available fully sequenced genomes of 
dsDNA viruses for the presence of DNA replicase com- 
ponents. In all, genomes of 808 viruses including 458 
(57%) bacteriophages, 317 (39%) eukaryotic and 33 
(4%) archaeal viruses were analyzed. Specifically, we 
looked for DNA polymerases, polymerase processivity fac- 
tors (DNA sliding clamps) and clamp loader subunits. We 
detected DNA polymerases in about half of the analyzed 
viral genomes. In addition to either known or previously 
annotated enzymes, for the first time we identified highly 
divergent DNA polymerases in phiKZ-like bacterio- 
phages. We found a significantly smaller fraction of 
genomes (<20%) coding for homologs of DNA sliding 
clamps that may serve as DNA polymerase processivity 
factors. We newly discovered remote homologs of cellular 
DNA sliding clamps in Microcystis phage Ma-LMMOl 
and the Ascoviridae family. DNA sliding clamps that 
form rings (PCNA, polllip, gp45) need a multimeric 
clamp loader for their loading onto DNA. In line with 
this prerequisite, we detected clamp loader subunits only 
in genomes carrying genes of DNA sliding clamp 
homologs. Yet, surprisingly, not all PCNA or polllip 
homologs are accompanied by clamp loader subunits. 

Overall, the results revealed a great variety of DNA 
replicase components and their combinations in dsDNA 
viruses (for the complete list see File 1 in Supplementary 
Data). The variety is much larger than it is in all three 
domains of cellular life combined and seemingly without 
any discernible pattern. However, we reasoned that if the 
increase in viral genome size requires improved 
processivity properties of a DNA replicase we should be 
able to detect this dependency even in the face of this 
overwhelming variety. Indeed, the arrangement of viral 
taxonomic groups according to their average genome size 
revealed a clear trend (Figure 1). Viruses having smallest 
genomes (<40kb) either have a B-family protein-primed 
DNA polymerase or do not have a DNA polymerase at 
all. Viruses with larger genomes (40-140 kb) have their 
own DNA polymerases more often. These polymerases 
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Figure 1. DNA replicase components in dsDNA viral genomes. Viral taxonomic groups are arranged by their average genome size. DNA pol., DNA 
polymerase type; PolA, A-family; PolBr. B-family DNA polymerase that uses RNA as a primer; PolBp, B-family DNA polymerase that uses protein 
as a primer; PolC, C-family. Coloring scheme: white, no polymerases found; green, PolBp; yellow, PolA; gray, PolBr; pink, PolC. Newly identified 
replicase components are labeled in bold red font. Processivity factors, non-homologous to the cellular ones, are underlined. Minus sign indicates 
that the processivity factor is missing in some viruses within the taxonomic group. Error bars indicate standard deviation from the mean genome size. 



usually belong to A-, rarely to B- or C-families. Viruses 
having largest genomes (>140kb) always encode DNA 
polymerases (most often B-family RNA-primed), fre- 
quently have processivity factors and sometimes clamp 
loader summits. 

However, the representation of various viral taxonomic 
groups differs significantly. In addition, some taxons show 
quite large variation of the genome size. Therefore, we 
next asked whether or not the observed pattern of 



distribution of replicase components depends on the taxo- 
nomic classification of viruses. To address this question, 
we arranged individual genomes according to their size 
without dividing into taxonomic groups and plotted the 
observed frequency of a particular DNA replicase compo- 
nent against the moving average of the genome size 
(Figure 2). To reduce sample bias in this analysis, we per- 
formed pairwise genome comparisons and retained only 
236 viral genomes that were <70% identical to each other. 
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small Genomes arranged by size (window - 40) large 

Figure 2. Dependence between the observed frequencies of viral DNA replicase components and the genome size of dsDNA viruses. X-axis — 
genomes arranged by their size (from smallest to largest); major j-axis (left) — observed frequencies of various DNA replicase components in viral 
genomes; minor y-axis (right) — genome size (kb). The genome size and the observed frequencies of DNA replicase components were averaged using 
the moving window of 40 genomes and a single-genome step. Broken blue line corresponds to the averaged genome size. Solid lines correspond to 
averaged observed frequencies of individual DNA replicase components: all DNA polymerase types, red; PolBp, green; PolA, yellow; PolBr, gray; 
PolC, pink; known and predicted processivity factors, black. 



Again, the plot showed a clear relationship between DNA 
replicase components and the genome size, indicating 
that this is a general property and not the result of 
taxon-specific division. 

Having established a general dependency of the 
presence and the type of viral DNA replicase components 
on the genome size (Figures 1 and 2), we were nonetheless 
puzzled by the substantial number of seeming exceptions. 
While DNA polymerases are present in all taxonomic 
groups above the certain genome size, processivity 
factors and clamp loaders are not. If we assume that 
DNA replicase processivity properties become more im- 
portant as the genome size increases, how to rationalize 
the absence of DNA sliding clamps and clamp loaders in 
some taxons with the large average genome size? To 
address this question, we performed a detailed analysis 
of sequence and structure properties of DNA polymer- 
ases, sliding clamp homologs and clamp loader subunits. 
Results of this analysis for each of the three components 
of DNA replicases are presented in separate sections 
below. 

DNA polymerases 

Major DNA polymerase groups. We identified DNA poly- 
merases in 415 out of the 808 analyzed genomes of 
dsDNA viruses. The majority of DNA polymerases 
(255 genomes) belong to B-, less frequently (132) to A-, 



and very rarely (28) to the C-family. No polymerases of 
the archaeal D-family were detected. B-family polymer- 
ases are present in viruses that infect organisms from all 
three domains of life. In contrast, we found A- and 
C-family polymerases only in bacteriophage genomes. 
The greatest diversity by far is among B-family 
members, followed by the distantly related A-family 
(Figure 3). Most proteins belonging to the evolutionary 
unrelated C-family are fairly similar to each other. 

Based on sequence similarity, PolB polymerases can be 
divided into three distinct clusters: one including protein- 
primed (PolBp), and two that include RNA-primed 
(PolBr) polymerases (Figure 3). PolBp DNA polymerases 
include mutually highly similar adenoviral polymerases 
(PolBpAdeno) and significantly more diverse subgroups 
from bacteriophages (PolBpPhages) and archaeal viruses 
(PolB Arch Vir). The largest of the two PolBr clusters 
contains the majority of viral RNA-primed DNA poly- 
merases of B-family (PolBrCore group) and the small 
PolIIphages subgroup. The PolBrCore group, typified by 
polymerases from T4-like phages and Herpes Simplex 
virus 1, is closely related to eukaryotic and archaeal poly- 
merases (e.g. yeast Pol8 and the archaeal Pfu DNA poly- 
merase). Members of the PolIIphages subgroup can be 
distinguished from the main PolBrCore group by the char- 
acteristic motif ('NTDG') in the polymerase active site 
and a higher similarity to E. coli PolII. The small second 
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Figure 3. DNA polymerases of A- and B-families clustered by the pairwise sequence similarity. Nodes represent individual sequences. Lines connect 
sequences with P< le-05. Line shading corresponds to P-values according to the scale in the bottom-right corner (light and long lines connect 
distantly related sequences). A-family DNA polymerases are represented using shades of orange, PolBp — shades of green, PolBr — shades of gray; 
well-known cellular DNA polymerases are shown in white. Newly identified DNA polymerases are marked with the red ellipse. ArchVir, archaeal 
viruses; Adeno, Adenoviridae; gr, group; PhiKZ, phiKZ-like phages; Pfu, Pyrococcus furiosus; See, Saccharomyces cerevisiae; Taq, Thermits aquaticus. 



PolBr cluster consists of highly divergent PolBrPhiKZ 
polymerases identified in this study for the first time. 

PhiKZ-like viruses have a genome that is almost twice 
as large as that of T4 phage (e.g. Pseudomonas phage 
201phi2— 317 kb, T4— 169 kb), yet no DNA polymerases 
were found in their genome sequences during previous 
analyses (39-41). Since our initial data suggested that 
the absence of a polymerase gene in viral genomes of 
this size is highly unlikely, we performed a particularly 
thorough analysis of the genomes of PhiKZ-like phages. 
Not surprisingly, standard homology detection methods 
(BLAST, RPS-BLAST and PSI-BLAST) failed to detect 
statistically significant similarity between predicted pro- 
teins of these phages and any known polymerases. Only 
when we applied very sensitive homology search methods 
based on profile-profile comparison, we were able to iden- 
tify putative polymerases. Thus, HHsearch (27) matched 
Pseudomonas phage EL hypothetical protein (gi: 
82700954) and the RB69 (T4-like) phage DNA polymer- 
ase gp43 with high statistical significance (89% prob- 
ability). COMA (42) for the same phage EL protein 
also identified a B-family DNA polymerase (from 



Thermococcus sp.) as the best match (E = 4e-07). The 
putative EL polymerase and its homologs in the other 
two phiKZ-like phages apparently include all the polymer- 
ase domains characteristic of gp43 except for the 
N-terminal region, which harbors the 3—5' exonuclease 
domain. Interestingly, the 3—5' exonuclease domain in 
these phages has been detected previously as a separate 
ORF (41). Thus, 3'-5' exonuclease and polymerase activi- 
ties in these phages appear to reside in two separate poly- 
peptide chains (Figure 4). To further validate the 
polymerase assignment we analyzed the motifs, essential 
for the DNA polymerase function. Both sequence motifs 
harboring active site residues are conserved between RB69 
gp43 and predicted polymerases in all three phiKZ-like 
phages (Figure 4B). In particular, as illustrated with a 
3D model of the predicted EL polymerase active site, 
both aspartates (Figure 4C) involved in the coordination 
of metal ions are absolutely conserved. B-family polymer- 
ases often interact with corresponding DNA sliding 
clamps through a short C-terminal sequence motif. 
Predicted polymerases of phiKZ-like phages at the very 
C-terminus feature a consensus motif, which may be 
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Figure 4. Comparison of DNA polymerases from phiKZ-like phages 
and the RB69 phage. (A) Correspondence of structural domains in 
Pseudomonas phage EL 3'— 5' exonuclease (gi: 82700984) and DNA 
polymerase with those in the RB69 DNA polymerase. N, N-terminal; 
P. palm; F, fingers; T, thumb. Red stars indicate positions of the active 
site aspartates (D229 and D398). The correspondence was derived using 
COMA server. Unaligned regions are represented as the white boxes. 
(B) Alignment of the DNA polymerase active site motifs. For each 
sequence, the beginning and end positions are indicated. Numbers in 
parenthesis correspond to the number of residues omitted from the 
alignment. Sequence labels consist of the phage acronym, the protein 
name, and the gi number (PDB code in the case of RB69). (C) A 3D 
model of the Pseudomonas phage EL DNA polymerase active site com- 
plexed with the primed DNA and the incoming dTTP based on the 
ternary complex of the RB69 DNA polymerase and the DNA (PDB 
code: lig9). A fragment of the polymerase active site is shown in 
cartoon representation. Side chains of the active site aspartates 
coordinating two metal ions (green spheres) are shown as pink sticks. 



considered to represent a variant of the clamp-binding 
motif (16). The functional significance of this motif 
('TRLISDFY', key hydrophobic positions are underlined) 
is not obvious, as aromatic residues in one of the three 
polymerases are substituted with hydrophilic ones. We 
also did not find homologs of sliding clamps in the 
genomes of the phiKZ-like group. However, there is a 
chance that corresponding proteins are encoded in the 
genomes, but their sequences might have diverged 
beyond recognition. 

A-family DNA polymerases could be subdivided into 
three groups. The most diverse group, PolAgrl, contains 
phages such as phiKMV, L5, N4, T5, SPOl, RSL1 and 
Ma-LMMOl. Interestingly, the SPOl DNA polymerase 
has the additional uracil-DNA glycosylase (UDG) 
domain at its N-terminus. It has been hypothesized that 
the UDG domain may serve as the intrinsic polymerase 
processivity factor (43). According to our analysis, the T5 
DNA polymerase, which is highly processive (44), also has 



the UDG domain-like extension at the N-terminus. 
Taking into account that UDG (D4) in complex with 
A20 confers DNA polymerase processivity in eukaryotic 
vaccinia virus (45), the role of the UDG domain as the 
intrinsic polymerase processivity factor is quite likely. 
Groups 2 and 3 consist of T7-like and Beep 1 -like viruses 
respectively. 

Viral C-family DNA polymerases have domain organ- 
ization similar to that of E. coli polllla (4). The conser- 
vation extends from the N-terminal PHP domain and 
includes the polymerase active site as well as the 'fingers' 
domain. However, the C-terminal region following the 
'fingers' domain does not show significant similarity to 
the E.coli replicative polymerase suggesting that it may 
include different structural domains. Only the DNA poly- 
merase from Bacillus phage 0305phi8-36 (gi: 154622917) 
appears to extend sequence conservation past the 
'fingers' domain and into the OB-domain. In addition, 
this polymerase has a sequence motif (1131-EEDLL- 
1135) that aligns to the pollllp interaction motif in 
E. coli polllla (920-QADMF-924) suggesting that it may 
utilize a DNA sliding clamp to achieve the processivity. 
Incidentally, the Bacillus phage 0305phi8-36 has the largest 
genome of those found to carry a C-family polymerase, 
and the only one among them in which we found a polllip 
homolog (gi: 154622720). 

Distinct subgroups of RN A-primed B-family DNA 
polymerases. The application of a more stringent cluster- 
ing procedure (using CLANS coupled with BLAST 
instead of PSI-BLAST) revealed a number of subgroups 
within the large PolBrCore cluster (Supplementary 
Figure SI). Since most PolBrCore polymerases are 
present in viruses with fairly large genomes, we analyzed 
polymerase sequences from poorly characterized sub- 
groups to obtain hints as to the possible DNA replication 
processivity mechanisms. Polymerases of T4-like phages 
and herpesviruses that utilize DNA sliding clamps as 
processivity factors are known to possess characteristic 
clamp-binding motifs at their C-termini (16). Therefore, 
we looked for the presence of any clamp-binding motifs 
in all remaining subgroups. We readily identified a 
putative PCNA-interacting motif (the consensus 
sequence QxxIxxFF, where x is any amino acid) within 
the C-terminus of phycodnaviral DNA polymerases. In 
other subgroups we either did not find any clamp-binding 
motifs, the alignments of C-terminal regions were too 
variable or the number of sequences was too small to 
make a definite conclusion. In addition to clamp-binding 
motifs we looked for the presence of additional domains. 
It turned out that the members of three outlying sub- 
groups (Malacoherpesviridae, Alloherpesviridae and 
Nimaviridae families; Supplementary Figure SI) feature 
additional sequence regions compared with typical 
PolBrCore representatives. Although we were unable to 
confidently assign any known functional/structural 
domains to these additional polymerase regions, their 
very presence suggests that these three viral families may 
have evolved alternative processivity mechanisms for the 
efficient replication of their large genomes. 
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Processivity factors 

Diversity and taxonomic distribution. Similarly as in the 
case of DNA polymerases, we asked whether each of the 
analyzed viral genomes encodes a polymerase processivity 
factor. In particular, we looked for homologs of either 
cellular (PCNA and polllip) or viral (gp45, UL42, 
UL44 and BMRF1) DNA sliding clamps. As a result, in 
addition to already characterized or annotated sliding 
clamps, we discovered two new putative processivity 
factors: a PCNA homolog in the family Ascoviridae and 
a polllip homolog in the Ma-LMMOl phage. All sliding 
clamp homologs identified in viral genomes were pooled 
together with representatives of cellular sliding clamps 
(PCNA and polllip) and clustered. The results shown in 
Figure 5 indicate that, just like DNA polymerases, viral 
DNA sliding clamp homologs are significantly more di- 
verse than their cellular counterparts. Two major clusters 
correspond to PCNA and polllip families. Polllip 
homologs were found only in phages, while all PCNA 
homologs (except for PCNA from the archaeon 
Natrialba phage PhiChl and some baculoviruses) were 
found in eukaryote-infecting nucleo-cytoplasmic large 
DNA viruses (Figure 1). PCNA homologs from 



iridoviruses infecting cold-blooded vertebrates form a 
distinct subgroup in the PCNA cluster (Figure 5, 
CBvertlrido). In addition to two major clusters corres- 
ponding to PCNA and polllip families, there are two 
compact outlying groups: gp45 and UL42. Gp45 
includes DNA sliding clamps from T4-like phages, UL42 
is found in Herpesviridae, with both groups having struc- 
turally characterized representatives (17,46). Three add- 
itional divergent families of viral sliding clamps (UL44, 
BMRF1 and G8R) are not included in Figure 5 as the 
clustering procedure was unable to link these families 
and any other clamps. However, it is known that 
herpesviral UL44 and BMRF1 are structurally similar to 
UL42 and other DNA sliding clamps (18,19). G8R is a 
remote PCNA homolog (47) found in vaccinia virus and 
other members of the Chordopoxvirinae subfamily, 
however, it does not act as a processivity factor in DNA 
replication (48). 

During the search for PCNA homologs we identified 
PCNA in ascovirus DpAV-4a as one of the unassigned 
ORFs (File 1 in Supplementary Data) after the six-frame 
translation of the genome. We also found highly divergent 
PCNA homologs in two other ascoviruses, HvAV-3e 




Figure 5. DNA sliding clamps and their homologs grouped by the pairwise sequence similarity. Sliding clamps of model cellular organisms are 
labeled in white. Newly identified sliding clamp homologs are marked with ellipses. Ma-LMMOl, Microcystis phage Ma-LMMOl; RSL1, Ralslonia 
phage RSL1; 73, Pseudomonas phage 73; BcepGomr, Burkholderia phage BcepGomr; 0305phi8-36, Bacillus phage 0305phi8-36; Eco, Escherichia coli; 
ASFV, African swine fever virus; DpAV4a, Diadromus pulchellus ascovirus 4a; CBvertlrido, cold-blooded vertebrate animal iridoviruses. 
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(gi: 134287330) and SfAV-la(gi: 11932043 and 11932044). 
However, those sequences align poorly to cellular PCNAs 
and seem to be incomplete. In addition, the putative 
PCNA in SfAV-la is split into two ORFs. These observa- 
tions suggest that PCNA homologs in HvAV-3e and 
SfAV-la ascoviruses are likely non-functional. In some 
viruses we detected not single, but several copies of 
PCNA. Phycodnaviridae family viruses Ehv-86 and 
PBCV-1 have two, Mimivirus has three PCNAs (Supple- 
mentary Table SI). However, one PCNA from PBCV-1 
and Mimivirus (PBCV1_PCNA1 and MimiPCNAl, re- 
spectively) is more similar to PCNAs that are present as 
single copies in the phycodnavirus Ostreococcus virus 
OsV5 and CroV, a recently sequenced relative of 
Mimivirus (49). Therefore, it might be expected that 
PCNA1 sequences of PBCV-1 and Mimivirus represent 
orthologs essential for viral DNA replication. On the 
other hand, PBCV1_PCNA2 and Ehv86_PCNA2 are 
most similar to PCNAs from algae; therefore, it is likely 
that they have been acquired from the host. MimiPCNA2 
and MimiPCNA3 show the highest similarity to 
MimiPCNAl and most probably are the result of 
multiple gene and genome duplication events, inferred to 
have occurred during Mimivirus evolution (50). 

We detected polllip homologs in only twelve phages. 
Of the 12 polllip homologs, 7 have a typical length and 
five are shorter, covering only the second and third 
domains of polllip (Supplementary Figure S2). A 
full-length distant polllip homolog in Ma-LMMOl 
phage was identified (the HHsearch probability of 96%) 
for the first time. The Ma-LMMOl polllip is coded (locus 
tag: MaLMM01_gpl76) near other DNA replication 
proteins (51), supporting its putative processivity factor 
function. 

A number of the identified viral sliding clamp homologs 
may have been acquired through the horizontal gene 
transfer (patchy taxonomic distribution, high similarity 
to corresponding host proteins, the absence of a DNA 
polymerase in the viral genome). For example, only nine 
out of 53 baculoviruses have PCNA homologs, and seven 
of those show high similarity to PCNAs from mosquitoes 
and moths (Supplementary Figure S3). For one of baculo- 
viruses, Autographa californica micleopolyhedrovirus 
(AcMNPV), it has been shown that its own PCNA is 
not required for genome replication (52). As polllip and 
PCNA homologs, likely acquired through horizontal gene 
transfer (Supplementary Table S2), are either known or 
can be assumed to be dispensable for DNA replication, 
we did not include them in the summary presented in 
Figures 1 and 2. 

Unexpectedly, we did not find homologs of any known 
processivity factors in some viral families with the large 
average genome size. These include eukaryotic 
Nimaviridae, Alloherpesviridae, and Malacoherpesviridae 
families as well as phiKZ-like phages and Clostridium 
phage c-st (Figure 1). However, as discussed in the 
'Polymerases' section, DNA polymerases of the three eu- 
karyotic viral families are atypical B-family members with 
additional uncharacterized domains (Supplementary 
Figure SI). The Clostridium phage c-st DNA polymerase 
is one of the C-family polymerases having a divergent 



C-terminal region. These observations suggest that 
viruses from these families may use different mechanisms 
to ensure DNA replication processivity. In the case of 
PhiKZ-like phages, whether or not processivity factors 
are indeed absent from their genomes remains an open 
question. 

Electrostatic properties. DNA sliding clamp distribution 
in viral genomes (Figure 1) shows that Bacillus phage 
0305phi8-36 and several families of eukaryotic viruses 
carrying correspondingly polllip and PCNA genes in 
their genomes totally lack clamp loader subunits. Since a 
clamp loader is needed to open and load ring-shaped 
polllip or PCNA onto DNA, this finding raised a 
question as to how these sliding clamps may function. 
One possibility is that these viruses use a clamp loader 
of the host. Another possibility is that these clamps do 
not form a closed ring and, similarly to UL42 or UL44, 
bind DNA directly without the need for a clamp loader. 
While the first possibility cannot be explored using com- 
putational approaches, the second one can. 

One of the observed differences between non-ring 
sliding clamps (e.g. UL42, UL44) and the ring-forming 
ones (PCNA, polllip) is that the former have an increased 
positive charge located on the DNA-binding face (53,54). 
To explore the electrostatic properties of all the identified 
viral sliding clamp homologs, we calculated their theoret- 
ical pis. In addition, we constructed 3D models for repre- 
sentatives of viral PCNA homologs (Supplementary 
Table S3) and analyzed electrostatic properties of their 
surfaces. The obtained data was then compared to struc- 
turally and functionally characterized cellular and 
viral processivity factors (Figure 6 and Supplementary 
Table S4). It turned out that pis of sliding clamp 
homologs show a striking correlation with the presence/ 
absence of clamp loader subunits in corresponding viral 
families. Thus, Phycodnaviridae and Mimivirus PCNAs, 
predicted to be orthologous, have electrostatic properties 
similar to ring-shaped sliding clamps. In contrast, electro- 
static properties of G8R and PCNAs of Asfarviridae 
(ASFV), Irido-Asco viruses and Marseillevirus are more 
similar to herpesviral non-ring processivity factors. 
Phycodnaviridae and Mimivirus have RFC homologs, 
while Asfarviridae, Irido-Asco viruses and Marseillevirus 
do not. A similar correlation is observed for sliding clamp 
homologs in bacteriophages. Polllip homologs in phages 
Ma-LMMOl and RSL1 (Figure 6, Polllip virl) show 
much lower pi values than polllip in Bacillus phage 
0305phi8-36 (Polllip vir2). Phages Ma-LMMOl and 
RSL1 do encode clamp loader subunits, while Bacillus 
phage 0305phi8-36 does not. Hence, based on the electro- 
static properties, DNA sliding clamp homologs from 
Phycodnaviridae and Mimiviridae are expected to form 
rings, while PCNA homologs in the remaining families 
and polllip from the Bacillus phage 0305phi8-36 are 
likely to bind the DNA directly, in a manner that does 
not require clamp loaders. According to pi values, Polllip 
homologs of Ma-LMMOl and RSL1 phages are at the 
intermediate position between the characterized 
ring-forming and non-ring sliding clamps. However, the 
presence of clamp loader subunits (polllly) in the 
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Figure 6. Electrostatic properties of processivity factors and their homologs. (A) Average theoretical pis of DNA sliding clamp subunits from 
cellular organisms (green bars) and viruses (yellow bars). Bars with the grid pattern correspond to viral sliding clamp homologs that are accompanied 
by clamp loader subunits in the genome. (B) Electrostatic potential maps of solvent accessible surface of five representatives (red color indicates 
negative, blue — positive potential; scale units — K h T/e c ). All structures are shown in the same orientation as the ScePCNA complexed with DNA 
(PDB code: 3k4x). arch., Archaea; asco., — Ascoviridae; ASFV, African swine fever virus; euk., Eukarya; hHusl, Homo sapiens Husl (PDB code: 
3g65), HHV5_UL44, Human Herpesvirus 5 UL44 (PDB code: U61); irido., Iridoviridae; PCNA mars., Marseillevirus PCNA (gi:284504238); PCNA 
mim., Mimivirus PCNA (gi:55664866); PBCV1_PCNA1, Paramecium bursaria Chlorella Virus-1 PCNAl (gi:963 1761); See, S. cerevisiae; 
SSTIV_PCNA, Soft-shelled turtle iridovirus PCNA (gi:228861299); PolIIIpvirl, polllip from Microcystis phage Ma-LMMOl and Ralstonia phage 
RSL1 (gi respectively: 117530347, 189233246); PolIII|3vir2, polllip from Bacillus phage 0305phi8-36 (gi: 154622720). 



corresponding genomes suggests that the closed-ring 
polllip structure is more likely. 

To our surprise, we found that electrostatic properties 
of human checkpoint protein Husl and to a lesser degree 
of Rad9, but not of Radl, are also similar to non-ring 
viral processivity factors (Figure 6). Previously, experi- 
ments have established that Rad9, Husl and Radl form 
a heterotrimeric PCNA-like complex (the 9-1-1 checkpoint 
complex), and that they do not self-multimerize (55). In 
addition, it has been shown that different individual 
subunits can interact in a pairwise manner (55). Our 
results combined with these experimental data suggest 
that Husl and perhaps Rad9 might also bind DNA 
directly as monomers or as components of heterodimeric 
subcomplexes. Unfortunately, there does not seem to be 



any available experimental data on DNA-binding 
properties of Husl, Rad9 and Radl. 

Clamp loaders 

Compared to DNA polymerases and sliding clamps, 
homologs of clamp loader subunits are present in the 
fewest number of viral genomes. However, their genomic 
distribution appears to be highly non-random. We 
detected clamp loader subunits only in viruses with the 
largest genomes and only in those that also code for 
homologs of DNA sliding clamps. Moreover, as indicated 
above, the presence of clamp loader subunits correlates 
with electrostatic properties of DNA sliding clamps 
in the corresponding viral families. Hence, we found 
homologs of RFC subunits only in Mimivirus and 
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Phycodnaviridae, the only two families that have PCNAs 
with electrostatic properties similar to those of ring- 
forming cellular PCNAs (Figures 1 and 6). Mimivirus 
and its relative CroV code all five RFC subunits. 
Members of Phycodnaviridae family have only the largest 
RFC subunit homolog, similar to the archaeal large RFC 
subunit (RFCL). The exceptions include EsV-1, which 
encodes all five RFC subunits, and two other viruses 
{Ostreococcus virus OsV5 and Ostreococcus tauri virus 1) 
that do not have any RFC subunit. Interestingly, the 
genomes of the latter two viruses are among the smallest 
in the family. Homologs of bacterial clamp loader 
subunits were identified in only two phages, RSL1 and 
Ma-LMOOl. In each case we found only a homolog of a 
single clamp loader subunit, polllly. Both polllly homologs 
have conserved P-loop, DEXX and SRC motifs (Figure 7) 
suggesting that they are active ATPases. Again, polllip 
homologs in these two phages have significantly lower 
pis than polllip in Bacillus phage 0305phi8-36, lacking 
any clamp loader subunit (Figure 6). T4-like clamp 
loaders consisting of gp44 and gp62 subunits were 
identified only in T4-like phages. 

All five RFC subunits from Mimivirus and the 
phycodna virus EsV-1 are similar to corresponding 
human and yeast proteins (Figure 7) and have motifs for 
both ATP binding (P-loop) and hydrolysis (DEXX-motif). 
However, there are few differences compared to eukary- 
otic RFC. Collectively, structural studies of yeast RFC 
PCNA complex (56) and biochemical experiments (57,58) 
indicate that RFC1, RFC3 and RFC5 interact with the 
corresponding hydrophobic pockets of PCNA protomers. 
Human and yeast RFCl, RFC3 and RFC5 have progres- 
sively 'weaker' PCNA-interaction (PIP-box) motifs 
(Figure 7), correlating with the decreasing PCNA- 
binding strength (56-58). In Mimivirus RFC1 and RFC3 
PIP-boxes follow the same trend, but the PIP-box in 
RFC5 is more like the one in RFCl. Interestingly, 
EsV-1 has the 'strongest' PCNA-interaction motif 
in RFC5 followed by RFC3, and no PIP-box in the 
RFC large subunit. Notably, a similar non-canonical dis- 
tribution of the PIP-box 'strength' between RFCl, RFC3 
and RFC5 is also observed in some eukaryotes 
(Supplementary Figures S4-S6). Other phycodnaviruses 
including FSV, EhV-86 and Chlorella viruses have only 
a homolog of the RFC large subunit, which, similarly to 
EsV-1 RFCL, has no apparent PIP-box (Figure 7). At 
least in Chlorella viruses RFCL appears to be the 
inactive ATPase because of non-canonical substitutions 
in P-loop and the DEXX motifs, which are essential 
for ATP-binding and hydrolysis in the AAA+ protein 
family (59). 



DISCUSSION 

Our results show that the presence and the nature of DNA 
replicases encoded in the genomes of dsDNA viruses is 
related to the genome size. This relationship can be 
defined as the tendency to encode polymerase processivity 
components in addition to the DNA polymerase more 
often as the genome size increases. 



Viruses having genomes smaller than ~40 kb most often 
do not have their own DNA polymerases. However, if 
they do, it is usually a PolBp type DNA polymerase. 
Interestingly, this is seen in viruses infecting organisms 
from all domains of life. Coupled with the observation 
that PolBp polymerases disappear completely from 
larger viral genomes (Figures 1 and 2), this suggests that 
properties of protein-primed B-family DNA polymerases 
might be optimal for this genome size range. 

As the genome size increases (~40-140kb) A-family 
polymerases take over. However, it is not clear whether 
the dominance of A-family polymerases in this genome 
size range is significant. The reason is that we detected 
A-family polymerases only in bacteriophages, and this 
particular size range is overrepresented with bacterio- 
phage genomes. Nonetheless, even if we ignore the poly- 
merase type, the typical feature of genomes in this size 
range is the lack of DNA sliding clamp homologs. It has 
been shown that E. coli polymerase I (A-family) is 
stimulated by the polllip clamp (60). Therefore, the 
absence of sliding clamp homologs cannot be explained 
by the inability of polA to utilize sliding clamp as a 
processivity factor. Moreover, in two phages 
(Ma-LMMOl and RSL1) with large genomes (>150kb) 
we detected an A-family polymerase, a polllip homolog 
and a clamp loader subunit suggesting that the polllip 
homolog may function as a processivity factor together 
with polA. On the other hand, some bacteriophages 
have evolved the increased processivity of A-family poly- 
merases without using DNA sliding clamps. One such 
solution is the recruitment of thioredoxin from the host 
as observed in T7-like phages (61). The UDG-like domain 
in DNA polymerases of SPOl-like and T5-like phages 
may well be another solution, which is yet to be addressed 
experimentally. 

The genome size range of 140 kb and larger is repre- 
sented by eukaryotic viruses and bacteriophages. They 
all have their own DNA polymerases, typically of 
B-family. Our discovery of evolutionary distant DNA 
polymerases in phiKZ-like phages has eliminated the 
only seeming exception to this rule. DNA replicases in 
this size range often include DNA polymerase processivity 
factors and sometimes clamp loaders. Initially, there does 
not seem to be any discernible pattern as to the presence 
or absence of sliding clamp homologs and clamp loaders 
(Figure 1). However, if we consider properties of DNA 
polymerases, homologs of sliding clamps and the 
presence of clamp loader subunits we get a fairly 
coherent picture. 

Thus, we did not find any sliding clamp homologs in 
several groups of large dsDNA viruses. However, their 
DNA polymerases either have additional uncharacterized 
domains or non-homologous regions. It may be that these 
polymerases either possess an increased intrinsic 
processivity due to these additional/altered regions or 
use alternative processivity factors. On the other hand, 
the fact that we did not find any sliding clamp homolog 
in phiKZ-like phages is somewhat puzzling. Their poly- 
merases, although evolutionary distinct, seem to possess a 
typical B-family architecture. In addition, two of the three 
polymerases at their C-termini feature a putative signature 
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Figure 7. Alignment of eukaryotic and viral clamp loader subunits. Sequence alignment is based on multiple structure superposition of experimental 
X-ray structures and homology models obtained using MUSTANG (66). Secondary structure of the yeast RFC3 subunit (PDB code: lsxj) is shown 
above the alignment. PBCV1, Paramecium bursar ia Chlorella Virus-1; EsVl, Ectocarpus siliculosus virus 1; EhV86, Emiliania huxleyi virus 86; FSV, 
Feldmania species virus; Hsap, Homo sapiens; MaLMOOl, Microcystis phage Ma-LMOOl; Mimi, Mimivirus; RSL1, Ralstonia phage RSL1; Seer, 
Saccharomyces cerevisiae. 



of a clamp-binding motif. It is quite possible that 
processivity factors are encoded in genomes of phiKZ- 
phages, but are too strongly diverged to be detected with 
current methods. 



As it comes to the viral families that do have homologs 
of DNA sliding clamps, the intriguing finding was that 
a number of these families completely lack clamp loader 
subunits. However, the subsequent analysis of 
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electrostatic properties of sliding clamp homologs was 
quite revealing. It showed that PCNA homologs from 
Irido-Asco, Asfar-viruses and Marseillevirus as well as a 
polllip homolog from Bacillus phage 0305phi8-36, all have 
elevated pis (Figure 6A). Models of several representatives 
showed that most of the increased positive charge is 
localized to the DNA-interacting face (Figure 6B). This 
property is typical for well-characterized herpesviral 
processivity factors. They do not form rings; instead 
they bind DNA directly as monomers (UL42) or dimers 
(UL44). This suggests a similar direct DNA-binding mode 
for the sliding clamp homologs with the elevated pi and 
without clamp loaders in corresponding genomes. In this 
regard it is interesting to point out that we observed a 
similarly increased positive charge on the DNA inter- 
action side of two components of the human 9-1-1 
complex, Husl and, to a lesser degree, Rad9. In 
contrast, Radl, the third component of the complex, has 
electrostatic properties similar to those of cellular PCNAs. 
This observation suggests that Husl and perhaps Rad9 
could also bind DNA as monomers or as components 
of heterodimeric subassemblies and serve either as recruit- 
ment platforms or processivity factors for other proteins. 
Interestingly, there is genetic data supporting possible 
additional and different roles for Husl, Rad9 and Radl. 
Experimental data on telomere maintenance in 
Schizosaccharomyces pombe revealed that Radl mutants 
had telomere-shortening defects, whereas Husl and Rad9 
mutants had normal telomere lengths (62). More recently, 
it was shown that for carrying its telomere maintenance 
function, Radl requires the presence of either Husl or 
Rad9 (63). An interesting possibility is that the different 
electrostatic properties of Rad9, Husl and Radl revealed 
in this study may be responsible for the observed differ- 
ences in mutant phenotypes. 

Findings concerning viral clamp loaders are perhaps 
most puzzling compared to other replicase components. 
Only three eukaryotic viruses have a complete set of five 
RFC subunits corresponding to the eukaryotic clamp 
loader, RFC. As expected for functional RFC, all three 
viruses have characteristic P-loop and DEXX motifs in 
RFC 1-4 subunits and also feature PCNA-interacting 
(PIP-box) motifs in RFC1, RFC3 and RFC5. The 
analysis of their PIP-boxes led to an interesting observa- 
tion that the distribution of 'strength' of the PCNA- 
binding motifs across the three RFC subunits can be dif- 
ferent in comparison to human or yeast RFC (Figure 7). 
In other words, it appears that in the course of evolution 
the 'strength' of PCNA-binding motifs in RFC1, RFC3 
and RFC5 may evolve differently. This idea is also sup- 
ported by the observation that, in contrast to human 
and yeast, RFC5 sequences in some other eukaryotes 
(Supplementary Figures S4-S6) feature a canonical 
PCNA-binding motif, while RFC1 has a strongly reduced 
one. Several members of Phycodnaviridae family have only 
a single homolog of the RFC large subunit. From studies 
with human and yeast RFC it is known that the RFC 
large subunit determines the specificity for the clamp (1). 
For example, RFC1 determines specificity for PCNA, 
while Radl 7 — for the 9-1-1 complex. Thus, it may be 
that the viral homolog of the large RFC subunit recruits 



four small RFC subunits of the host to form a pentameric 
complex specific for binding and loading viral PCNA. 
However, these RFC large subunits seem to completely 
lack PCNA-binding motifs and some have non-canonical 
ATPase motifs. It has been shown that the mutation in the 
ATP- binding motif of the large RFC subunit in yeast does 
not affect PCNA loading (64). Therefore, the ATPase 
activity may also be dispensable in viral RFC large sub- 
units. It is not clear, though, how to reconcile the absence 
of a PCNA-binding motif with the expected specificity for 
the viral PCNA. Two large phages, Ma-LMMOl and 
RSL1, that have a bacterial clamp loading subunit 
homolog, polllly, additionally have an A-family DNA 
polymerase and a homolog of polllip sliding clamp. In 
these two cases it is also not clear what is the composition 
of the functional replicase. Does the viral polllly recruit 
host clamp loader subunit(s) to produce a functional 
clamp loader specific for the viral polllip? Or perhaps 
the composition of these clamp loaders is analogous to 
the T4 clamp loader, which is made of four copies of 
gp44 (polllly homolog) and a single taxon-specific 
subunit gp62 (no detectable homologs outside the 
T4-like group)? To address these questions, computational 
methods can hardly substitute laboratory experiments. 

Overall, our observed connection between the virus 
genome size and DNA replicase components might help 
in predicting the expected type and completeness of rep- 
licase components for newly sequenced viral genomes. In 
addition, our observations for DNA replicases in dsDNA 
viruses perhaps may have a more general significance. For 
example, symbiotic bacteria belonging to genus Hodgkinia 
and Carsonella have presently the smallest known cellular 
genomes of 144 and 160kb size, respectively (65). It turns 
out that neither has a DNA sliding clamp or a clamp 
loader. However, somewhat larger genomes of symbionts 
Sulcia cicada (277 kb), Buchnera Cc (416 kb) and 
Nanoarchaeum equitans (49 1 kb) already have the complete 
set of DNA replicase components. With more large viral 
and small cellular genomes available, it will be interesting 
to see how universal the observed relationship is. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
ACKNOWLEDGEMENTS 

The authors would like to thank Giedrius Sasnauskas, 
Edita Suziedeliene and Ana Vencloviene for comments 
and suggestions. 

FUNDING 

Howard Hughes Medical Institute (55005627); Ministry 
of Education and Science of Lithuania. Funding for 
open access charge: European Community's Seventh 
Framework Programme (FP7-REGPOT-2009-1 project 
245721 'MoBiLi'). 

Conflict of interest statement. None declared. 



8304 Nucleic Acids Research, 2011, Vol. 39, No. 19 



REFERENCES 

1. Indiani,C. and O'Donnell.M. (2006) The replication 
clamp-loading machine at work in the three domains of life. 
Nat. Rev. Mol. Cell. Biol, 1, 751-761. 

2. YangJ., Zhuang,Z., Roccasecca,R.M., Trakselis,M.A. and 
Benkovic.S.J. (2004) The dynamic processivity of the T4 DNA 
polymerase during replication. Proc. Natl Acad. Sci. USA, 101, 
8289-8294. 

3. Leipe,D.D., Aravind,L. and Koonin,E.V. (1999) Did DNA 
replication evolve twice independently? Nucleic Acids Res., 27, 
3389-3401. 

4. Lamers,M.H., Georgescu,R.E., Lee.S.G., O'Donnell.M . and 
Kuriyan J. (2006) Crystal structure of the catalytic alpha subunit 
of E. coli replicative DNA polymerase III. Cell, 126, 881-892. 

5. Bailey,S., Wing,R.A. and Steitz,T.A. (2006) The structure of T. 
aquaticus DNA polymerase III is distinct from eukaryotic 
replicative DNA polymerases. Cell, 126, 893-904. 

6. CannJ.K., Komori,K., Toh,H., Kanai,S. and Ishino,Y. (1998) A 
heterodimeric DNA polymerase: evidence that members of 
Euryarchaeota possess a distinct DNA polymerase. 

Proc. Natl Acad. Sci. USA, 95, 14250-14255. 

7. Ishino,Y., Komori,K., CannJ.K. and Koga,Y. (1998) A novel 
DNA polymerase family found in Archaea. J, Bacterid, 180, 
2232-2236. 

8. Berquist,B.R., DasSarma,P. and DasSarma,S. (2007) Essential and 
non-essential DNA replication genes in the model halophilic 
Archaeon, Halobacterium sp. NRC-1. BMC Genet., 8, 31. 

9. Kamtekar,S., Berman,A.J., WangJ., LazaroJ.M., de Vega,M., 
Blanco, L., Salas,M. and Steitz,T.A. (2004) Insights into strand 
displacement and processivity from the crystal structure of the 
protein-primed DNA polymerase of bacteriophage phi29. 
Mol. Cell, 16, 609-618. 

10. Doublie,S., Tabor,S., Long,A.M., Richardson.C.C. and 
Ellenberger.T. (1998) Crystal structure of a bacteriophage T7 
DNA replication complex at 2.2 A resolution. Nature, 391, 
251-258. 

11. Delarue,M., Poch,0., Tordo,N., Moras,D. and Argos.P. (1990) 
An attempt to unify the structure of polymerases. Protein Eng., 3, 
461^467. 

12. Koonin,E.V. (2006) Temporal order of evolution of DNA 
replication systems inferred by comparison of cellular and viral 
DNA polymerases. Biol. Direct., 1, 39. 

13. Bruck.I. and O'Donnell.M. (2001) The ring-type polymerase 
sliding clamp family. Genome Biol, 2, REVIEWS3001. 

14. Williams,G.J., Johnson,K., RudolfJ., McMahon,S.A., Carter,L., 
Oke,M., Liu,H., Taylor.G.L., White,M.F. and NaismithJ.H. 
(2006) Structure of the heterotrimeric PCNA from Sulfolobus 
solfataricus. Acta Crystallogr. Sect. F Struct. Biol. Cryst. 
Commun., 62, 944-948. 

15. Parrilla-Castellar,E.R., Arlander,S.J. and Karnitz,L. (2004) Dial 
9-1-1 for DNA damage: the Rad9-Husl-Radl (9-1-1) clamp 
complex. DNA Repair, 3, 1009-1014. 

16. Dalrymple.B.P., Kongsuwan.K., Wijffels,G., Dixon,N.E. and 
Jennings,P.A. (2001) A universal protein-protein interaction 
motif in the eubacterial DNA replication and repair systems. 
Proc. Natl Acad. Sci. USA, 98, 11627-11632. 

17. Zuccola,H.J., Filman,D.J., Coen,D.M. and HogleJ.M. (2000) The 
crystal structure of an unusual processivity factor, herpes simplex 
virus UL42, bound to the C terminus of its cognate polymerase. 
Mol. Cell, 5, 267-278. 

18. Appleton,B.A., Loregian,A., Filman,D.J., Coen,D.M. and 
HogleJ.M. (2004) The cytomegalovirus DNA polymerase subunit 
UL44 forms a C clamp-shaped dimer. Mol. Cell, 15, 233-244. 

19. Murayama,K., Nakayama,S., Kato-Murayama.M., Akasaka.R., 
Ohbayashi.N., Kamewari-Hayami,Y., Terada,T., Shirouzu,M., 
Tsurumi,T. and Yokoyama,S. (2009) Crystal structure of 
epstein-barr virus DNA polymerase processivity factor BMRF1. 
J. Biol. Chem., 284, 35896-35905. 

20. Ghosh,S., Hamdan,S.M., Cook,T.E. and Richardson,C.C. (2008) 
Interactions of Escherichia coli thioredoxin, the processivity 
factor, with bacteriophage T7 DNA polymerase and helicase. 

J. Biol. Chem., 283, 32077-32084. 



21. Chen,Y.H., Lin,Y., Yoshinaga,A., Chhotani,B., LorenziniJ.L., 
Crofts,A.A., Mei,S., Mackie,R.I., Ishino,Y. and CannJ.K. (2009) 
Molecular analyses of a three-subunit euryarchaeal clamp loader 
complex from Methanosarcina acetivorans. /. Bacteriol., 191, 
6539-6549. 

22. Federici,B.A. and Bigot.Y. (2010) Evolution of 
immunosuppressive organelles from DNA viruses in insects. 

In Pontarotti,P. (ed.), Evolutionary Biology - Concepts, Molecular 
and Morphological Evolution. Springer Berlin Heidelberg, 
pp. 229-248. 

23. Frith,M.C, Wan,R. and Horton,P. (2010) Incorporating sequence 
quality data into alignment improves DNA read mapping. 
Nucleic Acids Res.. 38, el 00. 

24. Wernersson,R. (2006) Virtual ribosome-a comprehensive DNA 
translation tool with support for integration of sequence feature 
annotation. Nucleic Acids Res., 34, W385-W388. 

25. Altschul,S.F., Madden,T.L., Schaffer,A.A., ZhangJ., Zhang,Z., 
Miller,W. and Lipman,D.J. (1997) Gapped BLAST and 
PSI-BLAST: a new generation of protein database search 
programs. Nucleic Acids Res., 25, 3389-3402. 

26. Marchler-Bauer,A., AndersonJ.B., Chitsaz,F., Derbyshire, M.K., 
DeWeese-Scott.C, FongJ.H., Geer,L.Y., Geer.R.C, 
Gonzales.N.R., Gwadz,M. et al. (2009) CDD: specific functional 
annotation with the Conserved Domain Database. 

Nucleic Acids Res., 37, D205-D210. 

27. S6ding,J. (2005) Protein homology detection by HMM-HMM 
comparison. Bioinformatics, 21, 951-960. 

28. Margelevicius.M., Laganeckas,M. and Venclovas,C. (2010) 
COMA server for protein distant homology search. 
Bioinformatics, 26, 1905-1906. 

29. Kurowski,M.A. and BujnickiJ.M. (2003) GeneSilico protein 
structure prediction meta-server. Nucleic Acids Res., 31, 
3305-3307. 

30. Frickey,T. and Lupas,A. (2004) CLANS: a Java application for 
visualizing protein families based on pairwise similarity. 
Bioinformatics, 20, 3702-3704. 

31. Katoh.K, Misawa,K., Kuma,K. and Miyata,T. (2002) MAFFT: a 
novel method for rapid multiple sequence alignment based on fast 
Fourier transform. Nucleic Acids Res., 30, 3059-3066. 

32. Pei,L, Kim.B.H. and Grishin.N.V. (2008) PROMALS3D: a tool 
for multiple protein sequence and structure alignments. 
Nucleic Acids Res., 36, 2295-2300. 

33. Margelevicius,M. and Venclovas.C. (2005) PSI-BLAST-ISS: an 
intermediate sequence search tool for estimation of the 
position-specific alignment reliability. BMC Bioinformatics, 6, 185. 

34. Venclovas,C. and Margelevicius.M. (2009) The use of automatic 
tools and human expertise in template-based modeling of CASP8 
target proteins. Protein Struct. Fund. Bioinformatics, 11, 81-88. 

35. Sali.A. and Blundell,T.L. (1993) Comparative protein modelling 
by satisfaction of spatial restraints. /. Mol. Biol., 234, 779-815. 

36. Wiederstein.M. and Sippl.M.J. (2007) ProSA-web: interactive web 
service for the recognition of errors in three-dimensional 
structures of proteins. Nucleic Acids Res., 35, W407-W410. 

37. Rice.P., LongdenJ. and Bleasby,A. (2000) EMBOSS: the 
European Molecular Biology Open Software Suite. Trends Genet., 
16, 276-277. 

38. Dolinsky.T.L, NielsenJ.E., McCammonJ.A. and Baker,N.A. 
(2004) PDB2PQR: an automated pipeline for the setup of 
Poisson-Boltzmann electrostatics calculations. Nucleic Acids Res., 
32, W665-W667. 

39. Hertveldt,K, Lavigne,R., Pleteneva,E., Sernova,N., 
Kurochkina,L., Korchevskii,R., RobbenJ., Mesyanzhinov,V., 
Krylov,V.N. and Volckaert,G. (2005) Genome comparison of 
Pseudomonas aeruginosa large phages. /. Mol. Biol., 354, 
536-545. 

40. Mesyanzhinov.V.V., RobbenJ., Grymonprez,B., 
Kostyuchenko,V.A., Bourkaltseva.M.V., Sykilinda,N.N., 
Krylov,V.N. and Volckaert.G. (2002) The genome of 
bacteriophage phiKZ of Pseudomonas aeruginosa. /. Mol. Biol., 
317, 1-19. 

41. ThomasJ.A., Rolando,M.R., Carroll,C.A., Shen,P.S., 
Belnap,D.M., Weintraub,S.T., Serwer,P. and Hardies,S.C. (2008) 
Characterization of Pseudomonas chlororaphis myovirus 



Nucleic Acids Research, 2011, Vol. 39, No. 19 8305 



201varphi2-l via genomic sequencing, mass spectrometry, and 
electron microscopy. Virology, 376, 330-338. 

42. Margelevicius,M. and Venclovas,C. (2010) Detection of distant 
evolutionary relationships between protein families using theory 
of sequence profile-profile comparison. BMC Bioinformaties, 11, 
89. 

43. Weigel.C. and Seitz,H. (2006) Bacteriophage replication modules. 
FEMS Microbiol. Rev., 30, 321-381. 

44. Andraos,N., Tabor,S. and Richardson,C.C. (2004) The highly 
processive DNA polymerase of bacteriophage T5. Role of the 
unique N and C termini. J. Biol. Chem., 279, 50609-50618. 

45. Druck Shudofsky,A.M., Silverman,.!. E., Chattopadhyay,D. and 
Ricciardi,R.P. (2010) Vaccinia virus D4 mutants defective in 
processive DNA synthesis retain binding to A20 and DNA. 

/. Virol., 84, 12325-12335. 

46. MoarefiJ., Jeruzalmi,D., TurnerJ., O'DonnelkM. and KuriyanJ. 
(2000) Crystal structure of the DNA polymerase processivity 
factor of T4 bacteriophage. /. Mol. Biol, 296, 1215-1223. 

47. Iyer,L.M., Aravind,L. and Koonin,E.V. (2001) Common origin of 
four diverse families of large eukaryotic DNA viruses. J. Virol., 
75, 11720-11734. 

48. Boyle,K. and Traktman,P. (2009) Poxviruses. In Raney,K.D., 
Gotte,M. and Cameron,C.E. (eds). Viral Genome Replication. 
Springer US, pp. 225-247. 

49. Fischer.M.G., Allen,M.J., Wilson.W.H. and Suttle.C.A. (2010) 
Giant virus with a remarkable complement of genes infects 
marine zooplankton. Proc. Nail Acad. Sci. USA, 107, 
19508-19513. 

50. Suhre,K. (2005) Gene and genome duplication in Acanthamoeba 
polyphaga Mimivirus. /. Virol, 79, 14095-14101. 

51. Yoshida,T., Nagasaki, K., Takashima,Y., Shirai,Y., Tomaru,Y., 
Takao.Y., Sakamoto,S., Hiroishi,S. and Ogata.H. (2008) 
Ma-LMMOl infecting toxic Microcystis aeruginosa illuminates 
diverse cyanophage genome strategies. /. Bacteriol., 190, 
1762-1772. 

52. KooLM., Ahrens,C.H., Goldbach,R.W., Rohrmann.G.F. and 
Vlak,J.M. (1994) Identification of genes involved in DNA 
replication of the Autographa californica baculovirus. 

Proc. Natl Acad. Sci. USA, 91, 11212-11216. 

53. Komazin-Meredith,G., Santos,W.L., Filman.D.J., Hogle.J.M., 
Verdine,G.L. and Coen,D.M. (2008) The Positively charged 
surface of herpes simplex virus UL42 mediates DNA binding. 
/. Biol. Chem., 283, 6154-6161. 

54. Loregian,A., Sinigalia,E., Mercorelli,B., Palu,G. and Coen,D.M. 
(2007) Binding parameters and thermodynamics of the interaction 
of the human cytomegalovirus DNA polymerase accessory 



protein, UL44, with DNA: implications for the processivity 
mechanism. Nucleic Acids Res., 35, 4779^1791. 

55. Burtelow,M.A., Roos-Mattjus.P.M., Rauen.M., BabendureJ.R. 
and Karnitz,L.M. (2001) Reconstitution and molecular analysis of 
the hRad9-hHusl-hRadl (9-1-1) DNA damage responsive 
checkpoint complex. /. Biol. Chem., 276, 25903-25909. 

56. Bowman,G.D., O'DonnelkM. and KuriyanJ. (2004) Structural 
analysis of a eukaryotic sliding DNA clamp-clamp loader 
complex. Nature, 429, 724-730. 

57. Yao,N., Coryell,L., Zhang,D., Georgescu.R.E., Finkelstein.J., 
Coman,M.M., Hingorani,M.M. and O'Donnell.M. (2003) 
Replication factor C clamp loader subunit arrangement within the 
circular pentamer and its attachment points to proliferating cell 
nuclear antigen. J. Biol. Chem., 278, 50744-50753. 

58. Yao,N.Y., Johnson,A., Bowman,G.D., KuriyanJ. and 
O'DonnelkM. (2006) Mechanism of proliferating cell nuclear 
antigen clamp opening by replication factor C. /. Biol. Chem., 
281, 17528-17539. 

59. Iyer,L.M., Leipe,D.D., Koonin,E.V. and Aravind,L. (2004) 
Evolutionary history and higher order classification of AAA+ 
ATPases. /. Struct. Biol, 146, 11-31. 

60. Lopez de Saro,F.J. and 0'Donnell,M. (2001) Interaction of the 
beta sliding clamp with MutS, ligase, and DNA polymerase I. 
Proc. Natl Acad. Sci. USA, 98, 8376-8380. 

61. Bedford,E., Tabor,S. and Richardson,C.C. (1997) The thioredoxin 
binding domain of bacteriophage T7 DNA polymerase 

confers processivity on Escherichia coli DNA polymerase I. 
Proc. Natl Acad. Sci. USA, 94, 479-484. 

62. Dahlen.M., Olsson.T., Kanter-Smoler,G., Ramne,A. and 
Sunnerhagen,P. (1998) Regulation of telomere length by 
checkpoint genes in Schizosaccharomyces pombe. Mol. Biol. Cell, 
9, 611-621. 

63. Khair,L., Chang,Y.T., Subramanian,L., Russell,P. and 
Nakamura,T.M. (2010) Roles of the checkpoint sensor clamp 
Rad9-Radl-Husl (91 l)-complex and the clamp loaders 
Radl7-RFC and Ctfl8-RFC in Schizosaccharomyces pombe 
telomere maintenance. Cell Cycle, 9, 2237-2248. 

64. Schmidt.S.L., Gomes,X.V. and Burgers,P.M. (2001) ATP 
utilization by yeast replication factor C. III. The ATP-binding 
domains of Rfc2, Rfc3, and Rfc4 are essential for DNA 
recognition and clamp loading. /. Biol. Chem., 276, 34784-34791. 

65. McCutcheonJ.P. (2010) The bacterial essence of tiny symbiont 
genomes. Curr. Opin. Microbiol., 13, 73-78. 

66. Konagurthu.A.S., WhisstockJ.C, Stuckey,P.J. and Lesk.A.M. 
(2006) MUSTANG: a multiple structural alignment algorithm. 
Proteins, 64, 559-574. 



